A Gold Standard Maithili Raw Text Corpus Vol. II

A Gold Standard Maithili Raw Text Corpus Vol. II

0 reviews requests (0)
Catalogue Number: 1512
Stock In Stock

OverView

8,11,680 Words | 54 Titles | XML format | 3 Domains | 21 Sub-categories The Maithili Raw Text Corpus endows an unrivaled window in documenting the colloquialisms, idioms, regional vocabularies, and grammar that are essential to establi...
Please Login to see the price

Dataset Description

8,11,680 Words | 54 Titles | XML format | 3 Domains | 21 Sub-categories

 The Maithili Raw Text Corpus endows an unrivaled window in documenting the colloquialisms, idioms, regional vocabularies, and grammar that are essential to establishing frameworks for linguistic processing. The Maithili Raw Text Corpus is an extensive repository encapsulating the viable linguistic elements of Maithili textual materials. 

 The corpus of Maithili text can be broadly classified as literary and non-literary texts. Data has been collected from books and magazines and it is verified to be true to the original texts and then warehoused. Maithili Text Corpus encoded in a machine-readable form and stored in a standard format. The major encoding being used is Unicode and stored in XML format. The data is embedded with metadata information. The corpus has been created from the contemporary text in typed and digitized methods. 


 A detailed explanation of the Maithili Raw Text Corpus Vol. II will be available in the Maithili Text Corpus Documentation. 

 For any research-based citations, please use the following citations:

  1. Shantanu Kumar, Ankita Tiwari, Rajesha N., Manasa G., Dr. Narayan Kumar Choudhary, Prof. Shailendra Mohan. 2025. A Gold Standard Maithili Raw Text Corpus Vol. II. Central Institute of Indian Languages, Mysore. ISBN: 978-93-48633-01-9.
  2.  Dr. Rejitha K. S., Dr. Narayan Kumar Choudhary. 2025. LDC-IL Corpus Insights. Central Institute of Indian Languages, Mysore. ISBN: 978-93-48633-33-0.

Item specifics

  • Authors Shantanu Kumar, Ankita Tiwari, Rajesha N., Manasa G., Dr. Narayan Kumar Choudhary, Prof. Shailendra Mohan
  • Corpus Type Raw Text Corpus
  • Catalogue Number 1512
  • ISBN 978-93-48633-01-9
  • Data Source On Field
  • Release Date 20/03/2025
  • Terms and Conditions General instructions for use of the resources provided by LDC-IL.
Commercial User
Non-Commercial User
LDC-IL Raw Text Corpora: An Overview
LDC-IL Raw Speech Corpora: An Overview

Write a review

Please login or register to review